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Abstract 

A single queue incorporating a retransmission protocol is investigated, assuming that the sequence of per effort 
success probabilities in the Automatic Retransmission reQuest (ARQ) chain is a priori defined and no channel state 
information at the transmitter is available. A Markov Decision Problem with an average cost criterion is formulated 
where the possible actions are to either continue the retransmission process of an erroneous packet at the next 
time slot or to drop the packet and move on to the next packet awaiting for transmission. The cost per slot is 
a linear combination of the current queue length and a penalty term in case dropping is chosen as action. The 
investigation seeks policies that provide the best possible average packet delay-dropping trade-off for Quality of 
Service guarantees. An optimal deterministic stationary policy is shown to exist, several structural properties of 
which are obtained. Based on that, a class of suboptimal < L,K >-policies is introduced. These suggest that it 
is almost optimal to use a K-truncated ARQ protocol as long as the queue length is lower than L, else send all 
packets in one shot. The work concludes with an evaluation of the optimal delay-dropping tradeoff using dynamic 
programming and a comparison between the optimal and suboptimal policies. 

Index Terms 

Automatic Retransmission reQuest Protocols, ARQ, Single Queue, Delay, Dropped Packets, Markov Decision 
Process, Dynamic Programming 



This work is supported by the German Federal Ministry of Education and Research as part of the ScaleNet project 01BU566. 
The authors are with the Fraunhofer German-Sino Mobile Communications Lab, Heinrich-Hertz-Institut, Einsteinufer 37, D-10587 Berlin, 
Germany. Tel/Fax: +49303 1002-860/-863, e-mail: {giovanidis, wunder, buehler} @ hhi.de and with the Technical University of Berlin, 
Heinrich Hertz Chair for Mobile Communications, Werner- von-Siemens-Bau (HFT 6), Einsteinufer 25, D-10587 Berlin, Germany. 



submitted to the IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS 



2 



I. Introduction 

Retransmission protocols are applied in communications over fading channels to achieve reliability. The 
concept of such protocols is to detect erroneous packets at the receiver and then request retransmission 
for these data. The information whether the message has been detected as correct or erroneous is sent 
back to the transmitter via a binary feedback link using the signal NACK to request a new retransmission 
or ACK to declare that the packet is correctly received. In the latter case, the sender moves on to the 
first transmission of the next packet waiting in the buffer. In this way error-free communications can be 
guaranteed. However, reliability naturally comes at a cost. The costs include a waste of resources, e.g. 
power or spectrum, for the unsuccessful efforts, a reduction in transmission rate when error correction 
and detection parity bits are added to the original code, as well as a noteworthy increase in packet delay 
due to multiple channel reuses for the correct transmission of a single message. During the last decades, 
different types of retransmission protocols have been proposed in the literature (see [1], [2] and references 
therein) and adopted by standards of mobile networks such as UMTS, WiMAX and 3GPP LTE. These 
include the simple Stop-and-Wait ARQ (SW-ARQ) protocol, as well as Type-I or Type-II Hybrid-ARQ 
(HARQ) schemes, where Forward Error Correction codes (FEC) are used together with packet combining 
to enhance the protocols' performance, [3], [4], [5], [6], [7]. 

Several efforts to optimize ARQ protocols, in the sense of reducing necessary retransmission efforts 
with economy in resources at the same time, can be found in the recent literature. Power control per 
retransmission to maximize the throughput of ARQ protocols has been investigated in [8]. In [9] the 
optimal sequence of redundancy per effort is found based on a dynamic programming formulation. The 
work in [10] investigates the optimal power and rate allocation among SW-ARQ retransmissions, that 
guarantees average delay or throughput constraints, based on some channel state information (CSI) at the 
receiver. A trade-off between throughput and energy consumption when the CSI is partially observable is 
given in [11] through a semi-Markov Decision Process formulation, while delay- and overflow-aware joint 
rate and power adaptation for type-I and power adaptation for type-II HARQ protocols is provided in [12] 
and [13], respectively. Rather noteworthy is also the work in [14], where the average delay of a single 
user wireless communication system, which includes a buffer and incorporates SW-ARQ retransmissions, 
is optimized by combined power and rate control under average power constraints. Furthermore, in [15] 
optimal stopping arguments have been used to determine the optimal maximum retransmission number, 
with respect to a cost function which lineary combines throughput gain with delay and packet dropping 
costs. 

In the current work, a single queue incorporating an ARQ protocol is investigated. The per effort 
sequence of success probabilities {g^} in each ARQ round, up to correct packet reception, is a priori 
defined, the same approach found also in [16]. The motivation behind this is that the success probabilities 
generally depend on the amount of parity bits, type of modulation, transmission power and type of protocol 
used. When no channel state information is available, an optimal fixed choice of {qk] can be calculated 
offline (see also the investigations in [15] and [17]). Using a 2-dimensional state-space for the system 
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(current queue length and retransmission number), we formulate a Markov Decision Problem (MDP) 
with average cost criterion [18] where at each state we have to choose between continuing retransmitting 
the erroneous packet and dropping it to proceed to the first transmission of the next packet waiting. The 
investigation here seeks optimal policies that provide the best possible tradeoff between average delay and 
percentage of dropped packets. These are two conflicting performance mesures both of great importance 
concerning Quality of Service (QoS) guarantees in communications. 

The model under study as well as the MDP formulation are presented in Sections n and HI. Several 
structural properties of the optimal policy are proved in Section IV. Section V introduces a family of sub- 
optimal policies, named here < L, K >-policies, which encapsulate the structural properties of section 
IV and at the same time simplify the design considerably. The idea is that it is almost optimal to choose 
a truncated ARQ protocol having K as fixed maximum number of retransmissions and use it for reliable 
communications, as long as the queue length does not exceed a certain threshold L, after which the packets 
are sent in one shot and no more retransmissions take place. For the optimal policies on the other hand, 
the number of maximum allowable retransmissions is not constant but varies depending on the queue 
length. Evaluation of the optimal delay-dropping tradeoff and comparison between the optimal policy and 
the sub-optimal < L, K >-policies is presented in Section VI. Section VII concludes our work. Proofs of 
theorems from the main text are provided in the appendix. 

II. A Single Queue with Retransmissions 

Consider a single-server communication system, where data packets arriving from some information 
source are stored in a buffer while waiting for service. The time axis is considered time-slotted, all slots 
having equal length T. During each slot an arrival of a new packet may occur (or not) with probability 
< A < 1 (respectively 1 — A), i.e. the arrival processes is Bernoulli. The service rate is considered 
constant /j, — 1 [packet / slot] and a transmission of a packet is attempted at the end of each time slot 
under the condition that the queue is not empty. 

Due to the stochastic nature of the wireless channel, communications are generally unreliable and errors 
may occur. We assume that with use of error detection codes, erroneous packets are detected at the receiver 
with probability 1. The transmitter is informed via a zero-delay error-free feedback link whether decoding 
of the packet has been successful or not. A sequence of random variables X„e{0,l}, neNis used 
for the information fed back after decoding of a packet at slot n. Xn — denotes no acknowledgement 
(NACK) whereas Xn = 1 acknowledgement (ACK). It is noted here that in real systems, although the 
feedback link can be made rather reliable, the delay is often considerable. The current work does not 
address issues on delayed feedback to keep simplicity of the model. In case an ACK is fed back the 
packet is removed from the buffer and the first transmission of the next packet waiting in queue takes 
place during the next time slot (a nonpreemptive First Come First Serve service principle is assumed). 
On the other hand in case a NACK is fed back there exist two possibilities. Either the transmission of 
the erroneous packet is repeated at the next slot or the packet is discarded (we say that dropping occurs) 
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and the next waiting packet is served during the n + 1-th slot. The two posibilities may be considered 
as actions that a controller may choose in order to optimize the system performance. For each slot we 
introduce further the pair of random variables (r.v.'s) (f/„, The first one provides the current queue 
length whereas the second one the current retransmission number at slot n. The pair takes values within 
the set N x N_|_, N and N_|_ being the set of non-negative and strictly positive integers respectively. 
As explained in the introduction there exist various types of Automatic Retransmission reQuest (ARQ) 
protocols used in practical communications systems. In this work we describe each ARQ protocol by the 
one-step transition probability matrix [15] 



ARQ 



(1) 



^ qi Pi ^ ■ ■ ■ ^ 

92 p2 ■■■ 

93 P3 •■• 

y; y 

with unequal entries in general. The above matrix specifically describes the random process {Zn}, where as 
mentioned before Zn is the current number of retransmissions at time slot n and refers to non-truncated 
protocols which repeat transmissions as many times as necessary up to correct reception. It is typical 
of a specific type of random walk in one dimension better known as a success run. We denote as a 
retransmission policy a specific choice of the sequence of success probabilities {qk}- 

The random variables X„ and Zn are dependent. This is formally described as follows. The conditional 
probability of successful reception at stage k equals Pr {Zn+i = = A;) = Pr (X„ = l\Zn = k) = q^ 
and of failure Pr {Zn+i = k + l\Zn = k) = Pr (X„ = 0\Zn = k) = p^. Obviously {Zn} forms a Markov 
chain. On the other hand the sequence {Xn} depends on the number of time slots since the last ACK was 
received and is thus history dependent. Since the events are mutually exclusive and exhaustive it holds 
Pfc + 9a; = V/e = 1, 2, . . . and the matrix in ([T]) is stochastic. Conditions for its ergodicity can be found 
in [15]. The transition probability diagram is given in fig. |2l In the following the notation p [k) is often 
used instead of pk and p {Zn) instead of pz„ for simplicity. 

The matrix in ([T]) can be used as a general description of different types of ARQ protocols if no 
CSI (accurate or estimated) is available at the transmitter at any time, hence no link adaptation to 
current channel conditions is performed. This is standard in the literature (see [19], [20] and [2]). 
ACK/NACK is the only feedback assumed, possibly combined with information about the channel fading 
statistics. The latter can be infrequently sent to the transmitter. The values of the sequence {g^} generally 
depend on the available knowledge about the channel, the resources allocated per trial k, the applied 
modulation and coding schemes, as well as the type of the protocol used, which may exploit - or not 
- information from previous erroneous efforts. For Stop-and-Wait protocols errors occur in the case of 
channel outages and the success probability function depends only on the allocated power and coding 
rate of the current transmission. An optimization problem formulated in [17] finds the optimal power 
allocation for each retransmission effort and hence the optimal retransmission policy, given delay and 
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dropped packet penalties. In the case of HARQ protocols CSI is also not necessary. Upon detection 
of a transmission failure, a NACK signal is fed back and only redundant information is retransmitted. 
The receiver combines the soft information of original transmission with subsequent retransmissions 
to achieve a higher probability of successful decoding and also improve throughput performance. The 
sequence of success probabilities then depends on the way the soft combining is performed. In the case 
of Chase Combining the same packet is repeatedly sent and the receiver aggregates the retransmission 
energy. If Incremental Redundancy (IR) is applied redundancy bits are produced by using e.g. Rate 
Compatible Punctured Convolutional Codes (RCPC) and Turbo-codes. The redundancy bits are sent only 
when an error occurs and combined with the erroneous packets at the receiver increase the probability 
of correct reception step-wise. In [2] closed-form expressions for the success probabilities regarding the 
aforementioned protocols can be found, whereas for the case of HARQ with IR, the optimal partitioning 
of parity bits among retransmissions is found in [21]. In [9] the optimal Type II HARQ retransmission 
policy is found using dynamic programming. In all examples mentioned above, as well as in [16], the 
retransmission policy is fixed and does not vary adaptively with the channel conditions. 

III. A Discrete Time Markov Decision Problem with Average Cost Criterion 

In the following we describe the decision problem under study. Consider a Markov Decison Process [18] 
{T, S, A, Pr {•\S,A),c {S, A)}, with set of decision epochs T = {0, 1, . . .} and state space 5 = N x N+ 
the state in our problem being the pair of random variables Sn — {Un, Zn) at n. The action set is binary 
A — {0, 1}, Pr A) is the transition probability distribution of the system which is conditioned on 
the current state and action, whereas c is the system cost per time slot as a function of state Sn and action 
c : N X N+ X {0, 1} R+. 

At the beginning of slot n the state information Sn — {Un, Zn) is available at the controller. Using 
possibly some further knowledge over the entire history /i„ = {Si,Ai, . . . , An-i} of previous states and 
actions taken the controller may choose between two actions, which will affect the packet transmission at 
the next slot n-\-l. Either choose action A„ = and continue the retransmission cycle in the next slot in 
case a NACK is fed back (X„ = 0), or choose An — 1 and break the retransmission cycle irrespective of 
the result of decoding of the transmitted packet in n (Xn £ {0, 1}) and begin in 1 a new transmission 
round. It is important to note that choosing action An — 1 does not necessarily result in a dropped packet. 
The packet will be actually dropped only in the case of X„ = 0, which will occure with probability 



During slot n an arrival of a new packet a„ G {0, 1} may occur with probability A, which will be taken 
into account for the value of the queue length at the next state, together with the result of the current 
packet transmission Xn- Then for the state evolution Sn+i = {Un+i, Zn+i) 



p{Zn). 




(2) 
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■'n+l 



1 if An = 1 



The state transition probabilities are given in table HI 

In the scenario described above two measures are rather important for the performance of the single 
queue system. The first one is the average queue length U, related by Little's Law U/\ = W [22] to 
the average packet delay (average waiting time per packet in the buffer) W. The second is the average 
number of dropped packets A in case the ARQ cycle is interrupted by the controller prior to correct 
packet reception. A cost per time slot is introduced which equals the actual queue length while dropping 
is incorporated as a penalty, weighted by 5 > 0. As mentioned earlier, dropping occurs if action An = 1 
is chosen at slot n and the packet is not correctly received (X„ = 0), i.e. the dropping cost term equals 
5 An (1 — Xn). Since the result of decoding is known at the end of each slot and the cost per slot depends 
on the random variable X„, we use an expected cost, the expectation taken over the random disturbance 
Xn (see [18, p. 20]). In this way the cost can be written as a function of the state and action taken 



Cs{{Un,Zn),An) = Ex„ [f/„ + - X„)] = t/„ + (Z„) (4) 

For the case where f/„ = we have cs ((0, Z„) , An) = 0. In this work we aim to find a deterministic 
stationary policy, which is optimal among all history dependent randomized policies tt G 11^^, in terms 
of minimizing the average expected cost, for a certain value of the weight 6 > and initial state So 



JAS,So) := limsup^E-'^" 



■ N 
.n=l 



(5) 



The optimal strategy vr* G is defined as the one that satisfies J^r* {6, So) = J* (5, So) < J-k (5, So) , Vvr G 
n^^. Since the model can be easily verified to be unichain [18, pp. 348-354] (the transition matrix of 
each stationary deterministic policy has a single positive recurrence class) the optimal average cost (if it 
exists) is equal for all initial states So- 

The limsup average cost may not be finite Vvr G 11^^. This situation occurs in two cases, (a) The 
chain in is non-ergodic (see [15]) and hence there exists a positive probability for the ARQ chain 
never to return to state Z„ = 1, or (b) the arrival rate is greater than the minimum average service 
rate A > yUmin := (1 + Z^fcliPi • • -Pfe)^ [15], [23], the latter defined as the inverse of the expected 
retransmission number, in case no dropping occurs. In such cases the queue is unstable and all queue 
states are recurrent-null or transient. However there definitely exist policies for which the average cost 
remains finite. Two such families important for the investigation are provided in the following. 

Definition 1 A stationary policy truncating the ARQ protocol up to the K-th retransmission number for 
all states of the queue length so that An = 1 if Zn > K is called a K-retransmitting policy. 

Definition 2 A policy which does not allow retransmissions after a certain queue length L, in other words 
An = 1, if Un > L is called a L-truncating policy. 
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These two families definitely stabilize the single queue system. To see this, by appropriate choice of K in 
the first case we can have fi = [l + Y^k=i Pi ■ ■ -Pkj > > /Umin- For the second case, since the arrival 
process is Bernoulli, the queue length remains upper bounded by L (see © for = I and Un = L) for 
all n. 



A. Vanishing Discount Approach 

The solution of Markov Decision Problems with an average expected cost criterion is often studied 
in the literature [18], [24], [25] as the limit behavior of /3-discounted models when the discount factor 
< /? < 1 tends to 1. The discounted expected cost related to ^ is given by 



J-IT,P i^-, So 



lim E^'^° 



N 



Y,l3^-^cs{Sn,A^) 



.n=l 



(6) 



Jp (5, 5*0) < Jt,^p (5, 5*0) , Vvr G is the solution of the discounted minimization problem. We will from 
now on neglect the dependence on 5 in notation. From [18, Th.8.10.7 and 8.10.9] we have that under 
mild assumptions which can be verified for the problem at hand 



J* = lim(l-/5)j;(5„) (7) 

for all So E S. Furthermore, from [24, Th.3.8] the optimal policy vr* is also limiting discount optimal in the 
sense that there exists a sequence /5m ^ 1 and Sm S such that n* (S) = limm^oo tt/j^ (Sm) , VS* G 5. 
We may thus focus on discounted cost optimality and get the solution to the average cost problem passing 
to the limit /3 ^ 1 in ©. See also the works [26] and [11]. 

For the discounted expected cost problem we provide the Bellman optimality equations of the problem 
at hand, for each state S := {U,Z) = {l,k) G S. These are written as Jp{l,k) = TJ^{l,k) [18, pp. 
146-148], where 



1 > 1 



1 = 



Tj; {I, k) = min [l + l3 [qu (1 - A) j; [l -1,1) + g^AJ^ (/, 1) + 

+Pu (1 - A) j; (z, k + i)+ pkXj; ii + i,k + 1)] , 
i + 6pk + f3 [xj; {1, 1) + (1 - A) j; {I - 1, 1)] } 
Tj; (0, k) = p \xj; (1, k) + ii- A) j; (o, k)] 



(8) 



The existence of an optimal solution to the above discounted optimality equations is guaranteed by 
the Banach fixed-point theorem [18, Th.6.2.3] and some further technical conditions [18, Th.6.10.4, 
Prop. 6. 10.5] due to the unboundedness of the costs, which can be easily shown to be satisfied for the 
problem at hand. 
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B. Algorithmic Solution for Finite State Space: Value Iteration 

Using value iteration algorithms [18, pp. 160-161, pp. 364-365] we may find deterministic e-optimal 
policies for both discounted and average cost problems. The algorithms may be implemented only for 
finite state spaces. We argue here that bounding the maximum queue length by Lmax > and the maximum 
allowable retransmission number by -ft'max > provides an approximation to the optimal solution which is 
improving as A'^ax and Lmax increase. In the following we provide two examples (see Table HH) of average 
cost optimal policies for two different sets of success probabilities {g^} ,k = 1, . . . , K. The discount factor 
is P = 0.99 and the Bernoulli arrival rate is A = 0.4 [packet/sec]. A maximum number of -ft'max = 6 
retransmissions is allowed, the queue length is restricted to a length of L^ax = 10 packets and the packets 
are discarded {A = 1) if either K^nux or L^ax is exceeded. In the first example a monotone decreasing 
sequence of success probabilities is utilized, specifically having values = e"^^'^, k = 1,...,6 and 
weight 5 = 40. In the second one the sequence of success probabilities is monotone increasing, = 
1 — e^°^^, k = 1, . . . ,6 and 5 = 4. The examples above suggest that the optimal policy for the actual 
problem behaves monotonically in both the /- (queue length-) and k- (number of efforts-) axis. Observe 
that the I's in the last column which imply dropping and break the monotonicity in the second example 
are simply a result of the finite state-space and will not appear in the actual problem with countably 
infinite state space. 



As the examples in Table |II] indicate, for certain senarios, if it is optimal to drop at some specific queue 
length and number of retransmissions, it is as well optimal to drop for all greater queue states for the same 
k. The same result may be found for a fixed queue length and varying the retransmission number. The 
analysis that follows will be based on the assumption that the sequence of conditional success probabilities 
{qk} is either monotone non-increasing or non-decreasing w.r.t. k. This assumption is supported by results 
in [17] for the optimal choice of conditional success probabilities in the SW-ARQ case as well as by the 
fact that {qk} is definitely non-decreasing in the case of Type-II HARQ. 

All proofs of the following Lemmata and Theorems can be found in the Appendix. For the proofs value 
and policy iteration methods [18] are used. Let us first provide some monotonicity properties of the value 
function. 

Lemma 1 For all states (/, fc) G 5, m G N and iteration steps n G {0, 1, . . .} 



Lemma 2 For all states (/, fc) G 5, m G N, iteration steps n G {0, 1, . . .} and monotone non-increasing 
conditional success probabilities q'l > q'2 > • • it holds 



IV. Structural Properties of Optimal Policies 



J" (/, k) <.r{i + m, k) 



(9) 



r (/, k) < r (/, k + m) 



(10) 



Combining the above two Lemmata we obtain 
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Corollary 1 For all states (/, k) & S, n E {0, 1, . . .} and non-increasing success probabilities 

r{i,k) > r{i-i,k) > (11) 

We may now state the following Theorem. 

Theorem 1 Suppose qi > q2 ^ ■ ■ ■ Given a fixed queue state I and varying the retransmission number 
k, the optimal policy is of threshold type, i.e. there exists a critical state (j., kij, possibly dependent on I, 
such that dropping is optimal for k > ki, while continue is optimal for k < ki. 

A proof on the monotone behavior of the optimal policy at the 1-axis could not be attained, although 
all examples tested suggest that, given k, there exists some threshold queue length 4 such that dropping 
is optimal V/ > 4 and continuing retransmissions is optimal for / < 4- We can prove the following 
important property instead 

Theorem 2 For the optimal policy ir* there exists a threshold queue length l^^, such that \fl > l^^, , \/k, 
it holds that vr* (/, k) = 1. Furthermore, the threshold in the l-axis is always finite and more specifically 

/f* <5maxgfc+i + (l-A) (12) 

k 

Theorem 3 It is always optimal to drop - in other words l^Jl = 0, if 

s< ,13) 

1 + (1 - X)pi - miuk^ipk 
The bound is non-decreasing with A and tends to oo for A ^ 1. 

The results for the existence of a threshold in the l-axis in Theorem |2] and the optimality of the always- 
drop policy in Theorem |3] hold irrespective of the choice of success probabilities. For the case of monotone 
non-decreasing success probabilities we can prove by induction using Policy Iteration equivalently to 
Theorem [T] that dropping is optimal for k < ki and continue for k > ki, where the k-axis threshold is 
/-dependent. 

Theorem 4 Consider the case of monotone non-decreasing success probabilities < ^2 < • • with 
Qk ^ 1, as k ^ OO. For all states {l,k) G 5, m G Z_|_ the following two inequalities hold for the optimal 
policy 71* 

J^,{l,k) > J^. (14) 
Jn*{l,k) > J^*{l,k + m) (15) 

Furthermore, given a fixed queue state I and varying the retransmission number k, the optimal policy is 
of threshold type, i.e. there exists a critical state {l,ki^, possibly dependent on I, such that dropping is 
optimal for k < ki, while continue is optimal for k > ki. 
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V. Design Rules for the Optimal Tradeoff 

The above theorems provide important structural properties of the optimal policy. We may identify 
two very important parameters that define the structure, namely the sequence of success probabilities 
per retransmission as well as the weighting factor 5 > 0, which plays the role of the penalty when 
dropping occurs. Especially 5 has a crucial role in the upper bound for /J^'l provided. Specifically, Theorem 
|2] suggests that it is optimal to use an ARQ protocol only up to a finite queue length Z^'l . When the queue 
exceeds the threshold the packet is removed from the queue after the first effort regardless of the result 
of decoding. Since Bernoulli arrivals have been assumed, this keeps the queue finite for A < 1 and the 
optimal policy truncates the buffer up to length L = l^Jl. This reduces our investigation to the family of L- 
truncating policies (see Def. O. The theorems provide furthermore an upper bound for the queue length. 
The importance of the penalty weight 5 is emphasized by this expression. An increase in 6 represents 
an increase of the dropping cost, which results in an increase of the optimal finite buffer length. In this 
case we prefer to increase system reliability and reduce the ratio of dropped packets at the cost of higher 
packet delay. Theorem |3] presents a condition regarding 5 for which dropping is always optimal. Finally 
Theorems \T\ and |4] prove the threshold behavior of the optimal policy on the k-axis. For qi > q2 > ■ ■ ■ 
there exists a maximum positive integer K = max/ ki such that it is always optimal to drop for k > K 
irrespective of I. This motivates the search over the optimal K-retransmitting policy (see Def. [T]), which 
belongs to the family of policies for which retransmissions are allowed up to finite number of trials K. 
Observe by Theorem |4] that if gi < ^2 < • • • then K is always unbounded (possibly constrained by the 
maximum queue length) since after some /-dependent threshold it is always optimal to continue. 

Combining the two above suboptimal policies will provide a good approximation of the optimal strategy. 
Note that there will exist special values of 5 and success probability sequences for which this is the optimal 
solution as well. The performance of the L-truncating, K-retransmitting policies, which will be from 
now on named < L, K > -policies, will be analyzed in the following paragraph. 

VL Optimal and Suboptimal Delay-Dropping Tradeoffs 

Since from Theorem [2] the maximum queue length and consequently the maximum number of retrans- 
missions are always bounded, the state space is finite and standard algorithms such as policy iteration or 
value iteration can be implemented to determine the optimal solution. In the following, policy iteration 
for variable values of the dropping cost is used to provide the optimal delay and dropping tradeoff 
(figll] for decreasing respectively figl6] for increasing success probabilities), as well as the behavior of 
the delay - expressed as average queue length U - (fig. |4] respectively fig. IT]) and average dropping - 
limAT^oo J2n=i ^nP {Zn) " (Ag- [5] respectively fig. [8]) with respect to the dropping cost 5. The scenario 
implemented has arrival rate A = 0.6 and (3 = 0.99. The decreasing sequence of success probabilities equals 
Qk = e~°'^'^ with maxfc = 0.741, the increasing sequence of success probabilities equals = 1 — e^'^-^'^ 
with miuk qk = 0.394, maximum retransmission number Kmax = 10 and queue length Lmax = 40. 
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The results are compared to the < L, K >-policies suggested in the previous paragraph, where an 
algorithm similar to policy iteration but with certain adaptations is implemented to find the optimal L and 
K. The algorithm initializes with ttq which is simply the policy which always choses drop as optimal action. 
At each step n the policy 7r„ := 'n<L„,Kn> is evaluated (see [18, pp. 174-175]) where < L„, > are the 
max queue length and retransmission number and J-k^l^ k„> obtained. There are four options. Either 
end the algorithm or look for a lower Jtt^+i by increasing i^„, L„ or both by 1. In this way comparing 
between these possibilities the algorithm evolves and terminates when < Ln, Kn >=< Ln+i, Kn+i >. 

Observe that in both fig. [3] and fig. |6] the optimal delay-dropping tradeoff obtains a convex decreasing 
form. Lower delay implies necessarily a higher percentage of dropped packets. On the other hand the 
percentage of dropped packets for an arrival rate of A = 0.6 and the given {g^} is only a rather small 
percentage of the transmitted packets (< 3.2% for decreasing success probabilities and < 1.1% for 
increasing). The delay (average dropping) increases (decreases) with respect to the dropping cost S, as 
fig. |4]and fig. |7](fig. [5] and fig. [8]) illustrate. Noteworthy is the fact that the maximum average number of 
dropped packets is much smaller in the case of increasing success probabilities compared to the decreasing 
case, although the first transmission effort has a rather low success probability qi = 0.394. Observing the 
plots it can be concluded that the < L, K > -policies have a near optimal behavior which was more or less 
expected since they are designed based on the optimal structural properties. Due to their simplicity they 
may be prefered to the actual optimal strategy since after determining the optimal retransmission number 
K, ARQ is applied for / < L, irrespective of the queue state, while for I > L a one-shot transmission is 
made. Since the queue length cannot increase more than L an interesting application could be, given L as a 
design parameter for the maximum buffer length, to find the optimal maximum number of retransmissions 
of the ARQ chain. 

VIL Conclusions 

We have considered in the current work a single queue which incorporates retransmissions of erroneous 
packets and can be dynamically controlled. The per effort success probabilities of the ARQ rounds 
are a priori defined after some possible offline optimization, since the model under study does not 
include channel state information at the transmitter. This is reasonable since the importance of ARQ 
lies in providing reliable communications simply with binary feedback. A Markov Decision Problem was 
formulated, where the action space is binary with possible actions (a) continue retransmission of erroneous 
packets and (b) drop the packet and receive a penalty 6, while the cost per time slot is a linear combination 
of the current queue length and the penalty in case action (b) is chosen. Analysis of the structure of optimal 
policies TT* has shown that there exists a queue length threshold /^'l after which retransmitting a packet 
is not any more optimal and the packets are sent in a single effort. Furthermore, given a queue length /, 
there exists a threshold ki which suggests that if it is optimal to drop for retransmission number ki, the 
same is optimal VA; > ki, in the case of monotone decreasing success probabilities. The monotonicity is 
inversed for increasing success probabilities. The results have motivated the investigation over structurally 
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simpler delay-dropping policies which allow retransmissions as long as the retransmission number and 
the queue length remain below fixed thresholds k < K and / < L, to be calculated. These are called 
here < L, K >-policies. The optimal delay-dropping tradeoff as well as the comparison of optimal and 
suboptimal strategies has been obtained using standard policy and value iteration algorithms and the 
results were illustrated in plots. These provide the possible average delay-dropping pairs to be obtained 
by appropriate choice of the penalty parameter 5, depending on the demanded QoS. Furthermore the 
results support the near optimal behavior of the < L, K >-policies and suggest a simple, flexible, almost 
optimal alternative for simultaneously reliable and delay-constrained communications. 

Appendix 

The following notation is introduced as a means to reduce space. For value iteration we write T J" (/, k) = 
min |C"+^ (/, /c) , 6*2^^ (/, A;)|, while the index n is replaced by 7r„ for the policy iteration steps. The 
dependence of the values (/, k) in value iteration and Jtt^,/? in policy iteration on /? is omitted. 



{l,k) = l + p [qk (1 - A) J" (/ -1,1) + g,AJ" (/, 1) + 
+Pk (1 - A) J" {I, k + l)+ pkXr il + l,k + 1)] 
^2"+' (/, k) = l + 6pk + l3 [A J" (/, 1) + (1 - A) J" (/ - 1, 1)] 

Furthermore, rather useful for the analysis is the difference 

A J"+i (/, k) = (/, k) - C^+^ {I, k) = 
-Spk + P {Pk (1 - A) [J" (/, A; + 1) - J'^ (/ - 1, 1)] + pkX [r (/ + 1, A; + 1) - J" (Z, 1)]} (16) 

A. Monotonicity Properties 

We always choose J° (/, A;) = 0, V (/, k). 

Proof: [Proof of Lemma[I]l The inequality holds for n = 0. Suppose now that inequality ^ holds for 
iteration step n. We have to show that it also holds for J"+^ (/, k) = TJ^ (I, k), the Dynamic Programming 
T operator given in ([8]). We distinguish between two cases, (i) Suppose first that the 'continue' term 
C"^^ {I + m,k) is the minimum. Then 

rr {I + m, fc) ^ Cf+^ (/ + m,k) f 
Cf+i (/, k) > min {Ci"+i (/, k) , C^+' {I, k)} = T.r (/, k) 

(ii) In the same fashion, we may reach the above inequality when the 'drop' term is the minimum, by 
replacing Ci~^^ {I + m, k) by 6*2^^ (/ + m, k). Thus we have proved that in both cases J"+^ (Z, k) < 
r+^{l + m,k). ■ 



submitted to the IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS 



13 



Proof: [Proof of Lemma O The inequality certainly holds for n = 0. Suppose hypothesis (flOl) holds 
for some n. We proceed as in the proof of Lemma [T] and distinguish here the cases where the minimum 

in TJ" (/, A; + m) is (i) CJ^+i (/, k + m) and (ii) C^+^ (/, k + m). Then 

TJ" (/, k) < (/, k) < (/, + m) ^ T.r {I, k + m) 
where (a) holds from hypothesis (flOl) . Pk+m > Pfc and under the condition that (1 — A)JJ" (/, A; + 1) 

m 

-J" (/ - 1, 1)] + A [J" (/ + 1, A; + 1) -J" (/, 1)] > 0, which is satisfied since J" (/, A; + 1) > J" (/ - 1, A; + 1) 
J" (/ — 1, 1). For the case (ii) the above inequality also holds replacing index 1 by 2 and (a) is simply 
due to ^ and the fact that Pk+m > Pk- Then we have proved that if qi > q2 > ■ ■ ■ inequality (flOl ) holds 
Vn. ■ 



5. Proof of Theorem [7] 

Proof: Suppose continue is optimal for some state {l, k^ . Then using (fT6l) we have A J"+^ (^/, A;j < 0. 
We have to prove that A J"+^ (/, k) <0,yk< k. From ([M]) we have 

(1 - A) J" (/, A; + l) + AJ" (/ + 1, A; + l) 
< 5//? + (1 - A) J" (l -1,1) + AJ" (/, 1) 

Using LemmalUfor the left handside we have (1 - A) J" (/, k + 1)+A J" (/ + 1, A; + 1) < (1 - A) J" (/, A; + l) + 
A J" (j, + l,k + ij, \/k < k. Then there exists some maximum threshold value A;/ > 1 such that continue 
is optimal for k < ki and drop for k > ki. The result holds Vn and consequently also for the optimal 
discounted and average reward policy as n — >^ oo. ■ 

C. Proof of Theorem |2] 

Proof: Consider a sequence 7r„, n = 1, 2, ... of policies generated by the policy iteration algorithm 
which converge to vr* as n ^ oo. Suppose that for a certain n the policy 7r„ has the following two 
properties 

. PI: 7r„ has a threshold value /^'^^ such that 7r„ (/, A;) = 1, V/ > l^J:^ & VA; 
• P2: The following inequalities hold V/ > l^J^ 



J.„(/ + 2,l)- J^„(/ + l,l) > J,J/ + 1,1)- J^J/,1) 
J^„(/ + l,l) > J^„(/,l) 

(P2) implies that the difference (/, 1) — (/ — 1, 1) is non-negative and monotone non-decreasing 
for the specified set of states. 

We will prove that each policy 7r„ as defined above, generates a Hn+i with properties (PI) and (P2). 
Thus the optimal policy tt* exhibits the same behavior. 
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We have to first provide a ttq that satisfies (PI) and (P2) and initialize the policy iteration for n = 0. 
Consider the policy ttq for which ttq = 1, V(/, A;). In this case only a single transmission is allowed 
irrespective of the queue and retransmission state. The policy obviously has threshold /^^ = and property 
(PI) is fulfilled. We have to show that property (P2) also holds. This we will prove by induction. We first 
show the following inequality is satisfied for / = 

J^o (2, 1) - J^,, (1, 1) > J^, (1, 1) - J^,, (0, 1) 
Since always dropping takes place, using the expressions for C2 (2, 1) , C2 (1, 1) , C2 (0, 1) 



J^o (2, 1) - (1, 1) = 1 + /?A (J^,, (2, 1) - J^,, (1, 1)) + /? (1 - A) ( (1, 1) - .h, (0, 1)) 
J^Jl, 1) - ^, (0, 1) = l + 5pi 

Solving the first equation for J^^ (2, 1) — J^o (1; 1) ^iid taking the difference of the above two 

J-no (2, 1) - 2J^o (1, 1) + (0, 1) = 1- f3\ ^- opi 

The above expression is positive if /3 > 1 — hence the inequality holds for [3 sufficiently close to 1, 
as 5 ranges from to 00. Given now that the inequality in (P2) holds for / — 1 by induction we prove it 
also holds for /. 



(/ + 2, 1) - (/ + 1, 1) ^ = ^ [1 + ;5 (1 - A) ( (/ + 1,1)- (/, 1))] '^T'^ 

_i_ [1 + ^ (1 _ A) ( (/, 1) - (/ - 1, 1))] (/ + 1,1)- (/, 1) 

Furthermore, since J^,^^ (/ + 1,1) — J^-g (/, 1) > J^,^^ (1, 1) — Jt^o (0, 1) > 0, the second inequality of (P2) is 
also proved to be true. Thus, both properties hold for ttq which we can use to initialize the policy iteration 
algorithm. 

We further continue using induction. Choose / > /^'^ + 1. Since we assume that 7r„ satisfies (PI) and 
(P2), this implies that dropping occurs for the states (/ + 1, A; + 1) as well as (/, k + 1). 



(/, k + l)- (/ -1,1)= CJ" (/, k + l)- (/ -1,1) = 
/ + 6pu+i + A [/3J^„ (/, 1) - (/ - 1, 1)] + (1 - A) [/5J^„ (/ - 1, 1) - (/ - 1, 1)] (17) 

Observe now that for /9 — > 1, the last term vanishes. Since the difference J-,,^ (Z, 1) — J-,,^ (/ — 1, 1) is 
by (P2) non-decreasing for I > 1*^^ + 1 then so is (Z, /c + 1) — Jtt,, (/ — 1, 1). The same result holds for 
all queue states greater than I. 

Let us now proceed to the Policy Improvement step of the Policy Iteration algorithm. Using the previous 
observation and the expression of A J^^^^ (Z, k) from (fT6l) 
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AJ.„+,(/,A:) = -5p, + /5p,(l-A)[J^„(/,A;+l)- J^„(/-l,l)] + 

+ /?PfcA [J^„ (/ + 1, A; + 1)-J^„ (/,!)] (18) 

we conclude that A J^rn+i k) is non-decreasing w.r.t. /. 

We cannot include the case / = /^'j^ in the above analysis since we do not know whether J^^ {l^^^ + 1,1^ — 
Jn-^ {l^^^, 1^ > J^^ {l''^'^, 1^ — J^^ {l*^'-^ — 1, 1^. This we have first to prove. We know that for 7r„ dropping 
occurs for I = l*J^ and 1 = 1^}' +1. Then 



(PI) 



1-/3A 



'/^n + 1. l) - J^r. {I't l) = l) - - 1, l) + (19) 

and we conclude AJ.„^, (^t + 1, A;) > AJ.„^, (Ct, k). 

Then for each A; there exists a threshold ^"''^^ '•l^^'- ^) = 1 for ^ > Since the 

expression in (fT71) is non-decreasing and unbounded, the threshold for 7r„+i is always finite VA;. We have 

The threshold defined in this way may either stay the same as in 7r„ 
or increase (by a finite number of queue states). This we will use to verify (P2) for vr^+i. 

Since the threshold satisfies Z*'^ > l^I^ then for all (/, k) with / > /l^ , dropping occurs according to 
policy TXn- Then V/ > /^'^^^ it holds 

2 ■ J.„^, (/ + 1, 1) = 2 ■ d-^' {I + 1, 1) (/ + 2, 1) + (/, 1) 

= J.„^,(/ + 2,l) + J.„^,(/,l) 

and the inequality follows assuming that (P2) holds for tt^. For the second inequality we have for / > Z^'^^^^, 
using the expression in (fTTl ) and /5 ~ 1 



J..,, il + 1, 1) - il. 1) C^2""^ + 1, 1) - CI--' il. 1) ^ 
1 + [AJ.„ (/ + 1, 1) + (1 - A) (/, 1)] - [AJ.„ (/, 1) + (1 - A) (/ - 1, 1)] 



{P2) 

J.„(/ + l,l)- J.„(/,l) > 

Hence policy 7r„+i shares the same properties as 7r„ and the proof of the first part of the proposition 
is complete. For the second part using (fTSl) together with (fTTl) and (3 ^ 1 v^q can write after some 
manipulations 



A J.„^, k) = -6p, + p, (e„ + + P.A [J.„ + 1, l) - l)] - Pfe (1 - A) 

If AJ^^_^^ (j,';^^, k^ > then the threshold will stay the same for iTn+i- This reduces to 
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If +A 



(it: + 1,1)- J^. {iti 1)] > + (1 - A) 



The left handside is an expression that depends only on l^^, whereas the right handside is a constant 
that depends on system parameters. The inequality will definitely be satisfied VA; for l^^ > 5 max^ qk+i + 
(1 — A), since by the second inequality of (P2) the difference in brackets is non-negative. Then using the 
policy iteration algorithm the aforementioned threshold cannot be exceeded and the threshold is always 
finite. 



D. Proof of Theorem \3\ 

Proof: Assume that the policy iteration algorithm is initialized by ttq as described in the proof of 
the previous theorem. The optimal policy will be n* = ttq if the threshold = 0, Vn. Asssume that 
TCn = ttq. Let us first bound the difference AJtt^+i k)- Since Jtt,, + 1, 1) — Jn,, 1) is increasing in 
/, V/ > 0, then 

A J.„^, (/, k) > -Spk + PpkX [Jn (2, k + l)-Jn (1, 1)] + Ppk (1 - A) [Jn (1, k + 1) - (0, 1)] 
Omitting the details of the calculations, the following bound can be derived 

A J.„+, {I, k) > -6pk + Ppk [1 + S (pfc+i - pi)] + PPkY^ [1 + /3 (1 - A) (1 + 6pi)] 

Then, if the right handside is > ^ AJ^^^^ i^,k) > 0, V(/,A;). By simple calculations we get the 
expression in (fT3l) . The bound can be further written as 1/ {(1 — A) [1 + (1 — \)pi — min^^ipfc]} + 
A/ {1 + (1 — \)pi — mink^ipk}, clearly non-decreasing w.r.t A and tending to oo for A — > 1. ■ 

E. Proof of Theorem E] 

Proof: Let us show first that the two inequalities hold for policy ttq, where only dropping is chosen 
as action for all states. Inequality (fT5l) is easy to verify since J^.^ {I, k) = C2 (/, k) = l + 5pk + f^XJ^^o {1, 1) + 
/? (1 — A) J^ro — Ij 1) > J-Ko k + m), since pk > Pk+m, VA; G Z+. For the inequality (fT4l) observe from 
the previous that J^^ (/, k) > Jj.^ (/, k + m) holds for m 00. We prove then that J^^ (/, k + m|m^oo) > 

Jno {1-1,1). 



Jno {I , k -\- ?7i|ni— +00) "-^TTQ -^J 

l-5pi + /3A[^,(/,l)-^,(/-l,l)] + 

/?(1-A)[J^J/-1,1)- J^,(/-2,l)] > 

1 - 6pi + 13 {1 + 6pi) > 
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where (a) comes from Theorem|2l since the difference J^^q (/, 1) — J^p (/ — 1, 1) is monotone non-decreasing 
and (/, 1) - (/ - 1, 1) > J^o (1, 1) - -^^o (0, 1) = 1 + 5pi and (6) holds for /3 > 1 - 1/ (1 + 
which tends to 1 as 5 ^ oo. 

Assume now that the policy iteration algorithm is initialized with policy ttq and the above two inequalities 
hold for 7r„. We will prove that the same holds for 7r„+i, and hence the inequalities also hold for the 
optimal policy vr* as n — oo. Let us first consider (fT4l) 

min (/, k) , C;-^' (/, k)} - J.„^, (/ - 1, 1) > 

/3^1,(d) 

min (/, k) , ^2^"+^ (/, A;)} - (/ - 1, 1) > 

where (c) follows from the property of the policy iteration algorithm that J,r„+i < Jtt,, (for a proof the 
reader is referred to [18, Prop. 6.4.1, p. 175]) and {d) comes from the induction argument and is easy to 
verify for /5 — > 1. 

We continue to (fTSl ) and consider the two cases where (i) (/, k) = 1, or (ii) 0. 

A;) ^ C2"+^ (/, A;) > ^2"+^ (/, k + m)> J^„^, (/, A; + m) 

where (e) is for the case (i) due to the fact that pt > Pk+m- For case (ii) we have the same as above by 
changing the indices 2 with 1 and (i) with (ii). Inequality (e) now follows from the fact that pk > Pk+m 
and Jt,^ il,k) — J^,^ (Z — 1, 1) > from the induction hypothesis. 

For the 'Furthermore ' part of the Theorem, the proof follows the same lines as in that of Theorem [T] 
where inequality (fT5l) is used in place of (fTOl) . ■ 
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Tables 

TABLE I 



Transition probabilities Pr {•\Sn,A) of the Markov Decision Process under study 



An = 0, Un> 


an = 


an = 1 


ACK 


Pr((C/„-l,l)|^„,0) = (7(Z„)(l-A) 


Pr((i7„,l)|5„,0) =q(Z„)A 


NACK 


Pr (([/„, Z„ + 1) |^„, 0) - p (Z„) (1 - A) 


Pr((C/„ + l,Z„ + l)|5„,0)=p(Z„)A 


An = 1, Un> 


Pr((C/„-l,l)15„,l) = 1- A 


Pr (([/„, 1)|5„,1) = A 


A^ = {0,1}, Un = 


Pr (([/„, Z„)|5„,A„) = 1- A 


Pr((C/„ + l,l)|S„,A„)=A 



TABLE n 

Value Iteration for Finite State Space 
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Figures 
Pr[X, = 0|Z, = k] = pk 




An ={0,1} 



Pr[X,-l|Z, = k] = qk 
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n 



Fig. 1. A single server queue incorporating a retransmission protocol for the erroneous packets. 
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Fig. 2. Transition probability diagram for the random process {Zn} of current retransmissions. The ARQ Marliov chain is considered 
time-homogeneous with countably infinite states. 




Fig. 3. The optimal and < L,K >-suboptimal delay-dropping tradeoff for decreasing sequence of success probabilities. 
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Delay VS Dropping Cost (?i = 0.6, q^^ = exp(-0.3*k), K = 10, L = 35) 




500 1000 1500 2000 2500 3000 3500 4000 4500 5000 

Dropping cost 5 



Fig. 4. Average queue length over increasing dropping cost for (a) the optimal policy and (b) the < L,K >-policies. The case of decreasing 
sequence of success probabilities is illustrated. 
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Fig. 5. Decrease of average dropping with increasing dropping cost for (a) the optimal policy and (b) the < L, K >-policies. The case of 
decreasing sequence of success probabilities is illustrated. 
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Fig. 6. The optimal and < L,K >-suboptimal delay-dropping tradeoff for increasing sequence of success probabilities. 
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Dropping Cost 5 



Fig. 7. Increase of average queue length with increasing dropping cost for (a) the optimal policy and (b) the < L,K >-policies. The case 
of increasing sequence of success probabilities is illustrated. 
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Dropping Percentage VS Dropping Cost = 0.6, q. = 1-exp(-0.5*l<),K = 10, L =40) 
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Fig. 8. Decrease of average dropping with increasing dropping cost for (a) the optimal policy and (b) the < L,K >-policies. The case of 
increasing sequence of success probabilities is illustrated. 
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